A Corpus-based Syntactic Lexicon for Adverbs
نویسنده
چکیده
A word class often neglected in the field of NLP resources, namely adverbs, has lately been described in a computational lexicon produced at CST as one of the results of a Ph.D.-project. The adverb lexicon, which is integrated in the Danish STO lexicon, gives detailed syntactic information on the type of modification and position, as well as on other syntactic properties of approx 800 Danish adverbs. One of the aims of the lexicon has been to establish a clear distinction between syntactic and semantic information-where other lexicons often generalize over the syntactic behavior of semantic classes of adverbs, every adverb is described with respect to its proper syntactic behavior in a text corpus, revealing very individual syntactic properties. Syntactic information on adverbs is needed in NLP systems generating text to ensure correct placing in the phrase they modify. Also in systems analyzing text, this information is needed in order to attach the adverbs to the right node in the syntactic parse trees. Within the field of linguistic research, several results can be deduced from the lexicon, e.g. knowledge of syntactic classes of Danish adverbs. lexicon for Danish adverbs has newly been produced as one of the results of a Ph.D.-project financed by the Nordic language technology research programme 2000-2004 (for more information see www.norfa.no). The lexicon is integrated in STO 1 which is a national Danish follow-up to the former EU-funded lexicon-projects PAROLE and SIMPLE (see http://www.ub.es/ gilcub/SIMPLE/simple.html#Language), and gives syntactic information on the type of modification and position as well as on several other syntactic lexical properties of approx 800 Danish adverbs, selected on the basis of their frequency in a text corpus. The lexical information is based on a series of syntactic tests as well as an individual examination of each adverb in a newspaper corpus of 30 mill. tokens (" Berlingske Aviskorpus " , Berlingske Tidende & Weekendavisen 1999). The lexicon, which can be used in NLP systems as well as for linguistic research, differs in several ways from earlier large-scaled computational adverb lexicons. First of all it is established by a corpus based study of the syntactic behavior of each adverb; secondly it focuses on properties which can be tested purely syntactically in order to keep a sharp distinction in the lexicon between syntax and semantics-semantic information on the adverbs is planned to be described afterwards at a semantic level in the STO lexicon, with links …
منابع مشابه
A Corpus-based Analysis of Epistemic Stance Adverbs in Essays Written by Native English Speakers and Iranian EFL Learners
Academic essays entail taking a stance on the truth value of propositions. Epistemic adverbs deal with the speaker's assessment of the truth value of propositions. Employing a corpus-based approach with descriptive statistics and qualitative description, this study explored the use of epistemic stance adverbs in academic essays written by native English speakers and Iranian EFL learners. Follow...
متن کاملFeature extraction in opinion mining through Persian reviews
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...
متن کاملCoordination of -mente Ending Adverbs in Portuguese: An Integrated Solution
Portuguese -mente ending adverbs constitute a large, morphologically homogenous, but syntactically and semantically diverse lexical set. When coordinated, the first adverb loses the adverbial suffix and takes the shape of the base adjective, in the feminine-singular form. This raises the issue of its partof-speech (POS) classification (adverb or adjective?), but especially its adequate parsing,...
متن کاملExtending the adverbial coverage of a French morphological lexicon
We present an extension of the adverbial entries of the French morphological lexicon DELA (Dictionnaires Electroniques du LADL / LADL electronic dictionaries). Adverbs were extracted from LGLex, a NLP-oriented syntactic resource for French, which in its turn contains all adverbs extracted from the Lexicon-Grammar tables of both simple adverbs ending in -ment (i.e., ’-ly’) (Molinier and Levrier,...
متن کاملEvidently epistential adverbs are argumentative indicators: A corpus-based study
Argumentative indicators of discourse relations constitute crucial cues for the mining of arguments. However, a comprehensive lexicon of these linguistic devices is so far lacking due to the scarcity of corpora argumentatively annotated and the absence of an empirically validated analytic methodology. Recent studies have shown that modals, that express that things might be otherwise, and eviden...
متن کامل